Update to transformers v5#30566
Conversation
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request aims to update the transformers library to version 5. The changes correctly update the version in requirements/test.in and requirements/nightly_torch_test.txt, and also add the --pre flag to uv pip install in the Dockerfile to allow installation of the release candidate. However, there is a critical oversight: requirements/common.txt still contains a constraint transformers < 5. This will lead to build failures for any configuration that relies on common.txt. This file must be updated to allow transformers v5 for this PR to be mergeable.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
Documentation preview: https://vllm--30566.org.readthedocs.build/en/30566/ |
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
This reverts commit 40742ca.
Disable fused ops (VLLM_CPU_CI_ENV=0) for the untrained tiny-mixtral model on CPU to reduce bfloat16 rounding that causes logprob divergence. Also pass VLLM_CPU_ATTN_SPLIT_KV=0 to the CPU CI docker container. Co-authored-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
|
I believe we took care of all the CI failures with transformers v5 upgrade. Thanks @bigPYJ1151 for the CPU fix! Running full CI again now: https://buildkite.com/vllm/ci/builds/61345 hopefully it's the last |
Signed-off-by: khluu <khluu000@gmail.com>
XVERSE tokenizer is incompatible with transformers v5 due to an add_prefix_space / prepend_scheme mismatch in tokenizer.json that causes loading to fail. Cap at transformers<=4.57 until upstream fixes. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
|
Claude's approach for basics model test (extra init 2)
|
Signed-off-by: khluu <khluu000@gmail.com>
Move _get_lora_aux_cuda_stream, lora_linear_async, and the custom op registration out of the `if envs.VLLM_LORA_ENABLE_DUAL_STREAM:` block. The block was evaluated at import time, but test fixtures set the env var via monkeypatch after import, causing NameError / AttributeError when the runtime code tried to call these functions. They are only invoked when `_enable_aux_cuda_stream` is True (checked at runtime), so defining them unconditionally is safe. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
|
Claude's fix for this test: https://buildkite.com/vllm/ci/builds/61345#019d8f38-a3e5-47f5-94aa-031e3b466e29/L3122
|
|
Claude's fix for step3 tool parser
|
Wrap the get_config() call in get_tokenizer() with contextlib.suppress so it gracefully handles paths that don't contain a config.json (e.g. LoRA adapter directories passed as tokenizer paths). The config pre-registration is only needed for custom vllm configs and is irrelevant for adapter or tokenizer-only paths. Fixes test_quant_model_lora failure. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
|
Claude's fix for https://buildkite.com/vllm/ci/builds/61345#019d8f38-789f-401f-b021-183d4141b2f8/L2600
|
This reverts commit e187e72. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
For I've pushed 816db8b which uses The longer term solution is to upstream this override to Transformers, which I have done in huggingface/transformers#45449. |
This reverts commit cb03f5d. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
I've moved the fix for |
These models fail with `AttributeError: 'dict' object has no attribute '__name__'` on transformers v5.2+. Add max_transformers_version="5.1" until upstream compatibility is fixed. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
|
full CI run: https://buildkite.com/vllm/ci/builds/61509 |
The processing test uses check_version_reason="vllm", so the skip reason must be "vllm" not "hf" to actually take effect. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>
Changes:
5.5.30.22.2(as is required by Transformers5.0.0)0.18.1so that huggingface/peft@41c07f0 is included (guards import ofHybridCacheon Transformers version)1.13.0so that 4-bit bnb can work on Transformers v52.3.0so that state-spaces/mamba@35e927b is included (removes import that was deleted in Transformers v5)0.15.0as this is the earliest version that supports Transformers v5HF_HUB_DOWNLOAD_TIMEOUT=60to the CI environment to deal with the shortened timeout inhuggingface-hub>=1since it switched tohttpx4.57.5installedSome architectures/tests need to be skipped in order to get this upgrade through. We need this upgrade as it is blocking proper support of SoTA architectures released after Transformers v5. This is not a commitment to drop these architectures forever, simply a temporary measure. We plan to restore these architectures/tests following the upgrade.
Architectures/models that will no longer work after the upgrade:
Plamo2ForCausalLM- Custom model code uses_tied_weight_keys: list[str]but Transformers v5 now expects_tied_weight_keys: dict[str, str]OpenCUAForConditionalGeneration- Custom code is not compatible with Transformers v5OpenPanguVLForConditionalGeneration- OpenPanguVLVideoProcessorInitKwargs does not specify total=False, making all kwargs requiredAlibaba-NLP/gte-Qwen2-1.5B-instruct- numerical issues with this modelPaddlePaddle/PaddleOCR-VL- imports deleted objectInternS1ForConditionalGenerationBAAI/bge-code-v1XverseForCausalLMOvis2_5Ovis2_6_MoeForCausalLMMiniCPMOMiniCPMVPhi4ForCausalLMVInternLM2VEForCausalLMHCXVisionForCausalLMTarsier2ForConditionalGenerationSarvamMLAForCausalLMTests that are disabled after upgrade:
intern_vl,isaac,ultravoxbecause these models are broken in Transformers v5 and therefore the HF reference cannot be generatedjinaai/jina-embeddings-v3OpenGVLab/InternViT-*InternVisionModeljinaai/jina-reranker-m0jinaai/jina-embeddings-v3nvidia/NVIDIA-Nemotron-Parse-v1.1ColQwen3Supplementary PRs:
Transformers:
First 10 (click here to see them)
getattrinstandardize_rope_paramsbecauserope_parametersnot always present huggingface/transformers#42593RotaryEmbeddingConfigMixinhuggingface/transformers#42517validation_fnto beNoneinvalidate_ropehuggingface/transformers#42601rope_parametersto emptydictif there is something to put in it huggingface/transformers#42651torch.autocastif it will have an effect huggingface/transformers#42747pad_token_idhuggingface/transformers#43453Second 10 (click here to see them)
tied_weight_keysin-place huggingface/transformers#43619convert_rope_params_to_dictso it usesrope_thetafrom the config huggingface/transformers#43766Jamba] Fallback to slow path and warn instead of error out huggingface/transformers#43889Third 10 (click here to see them)
Mamba] Fix kernel loading huggingface/transformers#44176from_dictbackward compatibility with old remote code huggingface/transformers#44245Fourth 10 (click here to see them)
dtypefor subconfig when_from_confighuggingface/transformers#44629supports_{tp/pp}_planhuggingface/transformers#44696set_encoderhuggingface/transformers#44698Fifth 10 (click here to see them)
layer_typestype hint forAFMoEandLlama4huggingface/transformers#44874SizeDicthuggingface/transformers#44884image_processing_utils_fasthuggingface/transformers#44897vllm x v5] nit huggingface/transformers#44971Qwen2VLhuggingface/transformers#44976Sixth N (click here to see them)
attention_chunk_sizeinLlama4TextConfighuggingface/transformers#45002_further_process_kwargshuggingface/transformers#45033model_typeinAutoConfig.from_pretrainedhuggingface/transformers#45058vLLM:
First 10 (click here to see them)
--rope-scalingand--rope-theta#28006rope_scalingtorope_parametersin preparation for Transformers v5 #28542partial_rotary_factorfromrope_parameters#29966get_ropeto userope_parameters["partial_rotary_factor"], notrotary_dim#30389Second 10 (click here to see them)
httpxlogger less annoying when Transformers v5 is installed #30480head_maskfrom Ultravox and Swin #30764HfHubHTTPErrorin LoRA test #30768position_embedding_typewill be present for BERT and RoBERTa models #30770WeightRenamingfor Transformers modeling backend #31545min_pixels/max_pixelsfrom Qwen2VL's processor #33208tie_word_embeddingsfor multimodal models in Transformers v5 #33359return_dictforapply_chat_template#33372Third 10 (click here to see them)
lm-evalversion for Transformers v5 compatibility #33994mamba-ssmversion in CI for Transformers v5 compatibility #34233Fourth 10 (click here to see them)
Fifth 10 (click here to see them)
padding_indexfrom models that don't use it for better Transformers v5 compatibility #35189hf_override_fnwhen it modifiesmodel_type#35200inputs_embedslike Gemma 3 #36787ExaoneMoeMTPtest that never ran in Transformers v4 #36792Sixth 10 (click here to see them)
UltraVox] Fix output type #37224layer_type_validationfor Transformers v5 #37398Seventh N (click here to see them)
SpeculatorsConfignow thatPreTrainedConfigis adataclassin Transformers #37574Model repos:
Merged - 10 (click here to see them)
Unmerged - 13 (click here to see them)
Other:
modify_gen_kwargsinvllm_vlms.pyEleutherAI/lm-evaluation-harness#3573